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We investigate the computing power of a restricted class of DNA strand displacement structures: 
those that are made of double strands with nicks (interruptions) in the top strand. To preserve this 
structural invariant, we impose restrictions on the single strands they interact with: we consider only 
two-domain single strands consisting of one toehold domain and one recognition domain. We study 
fork and join signal-processing gates based on these structures, and we show that these systems are 
amenable to formalization and to mechanical verification. 

1 Introduction 

Among the many techniques being developed for molecular computing |5 1, DNA strand displacement has 
been proposed as mechanism for performing computation with DNA strands EO. In most schemes, 
single- stranded DNA acts as signals and double- stranded (or more complex) DNA structures act as gates. 
Various circuits have been demonstrated experimentally BlOl . The strand displacement mechanism is 
appealing because it is autonomous [4J: once signals and gates are mixed together, computation proceeds 
on its own without further intervention until the gates or signals are depleted (output is often read by 
fluorescence). The energy for computation is provided by the gate structures themselves, which are 
turned into inactive waste in the process. Moreover, the mechanism requires only DNA molecules: no 
organic sources, enzymes, or transcription/translation ingredients are required, and the whole apparatus 
can be chemically synthesized and run in basic wet labs. 

The main aims of this approach are to harness computational mechanisms that can operate at the 
molecular level and produce nano-scale structures under program control, and somewhat separately that 
can intrinsically interface to biological entities [2J. The computational structures that one may easily 
implement this way (without some form of unbounded storage) vary from Boolean networks, to state 
machines, to Petri nets. The last two are particularly interesting because they take advantage of DNA's 
ability to encode symbolic information: they operate on DNA strands that represent abstract signals. 

The fundamental mechanism in many of these schemes is toehold mediated branch migration and 
strand displacement ifTOl , which implements a basic step of computation. It operates as shown in Figure 
[T] where each letter and corresponding segment represents a DNA domain (a sequence of nucleotides, 
C,G,TjS) and each DNA strand is seen as the concatenation of multiple domains. Single strands have 
an orientation; double strands are composed of two single strands with opposite orientation, where the 
bottom strand is the Watson-Crick, C — G, T — A, complement of the top strand. The 'short' domains 
hybridize (bind) reversibly to their complements, while the 'long' domains hybridize irreversibly; the 
exact critical length depends on physical condition. Distinct letters indicate domains that do not hybridize 
with each other. 

In the first reaction of Figure[T] a short toehold domain t initiates binding between a double strand and 
a single strand. After the (reversible) binding of the toehold, the x domain of the single strand gradually 
replaces the top x strand of the double strand by branch migration. The branching point between the 
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Figure 1 : Toehold-mediated DNA branch migration and strand displacement 
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Figure 2: Examples of allowable single and double strands: th^jhch^tx^xt^x 



two top X domains performs a random walk that eventually leads to displacing the x strand. The final 
detachment of the top x strand makes the whole process essentially irreversible, because there is no 
toehold for the reverse reaction. The second reaction illustrates the case where the top domains do not 
match: then the toehold binds reversibly and no displacement occurs. The third reaction illustrates the 
more detailed situation where the top domains matches only initially: the branch migration can proceed 
only up to a certain point and then must revert back to the toehold: hence no displacement occurs and 
the whole reaction reverts. 

The fourth reaction illustrates a toehold exchange, where a branch migration (of strand tx) leads to a 
displacement (of strand xt), but where the whole process is reversible via a reverse toehold binding and 
branch migration. The first (irreversible) and fourth (reversible) reactions are the fundamental steps that 
can be composed to achieve computation by strand displacement. 



2 Two-domain Signals and Gates 

We now describe some DNA strand displacement structures that emulate, depending on the point of view, 
either chemical reactions or Petri net transitions. Their function is to join input signals mdfork output 
signals. To achieve compositionality, so that gates can be composed arbitrarily into larger circuits, it 
is necessary to first fix the structure of the signals. Any given choice of signal structure requires a 
different gate architecture, for example for 4-domain signals |9| (signals composed of 4 segments of 
different function), and 3-domain signals [IJ. Here we present a new, streamlined, architecture based on 
2-domain signals, where the gates can be combined into arbitrary circuits (including loops), and where 
the waste products do not interfere with the active gates. 
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Figure 3: Transducer T^y \tx^ ty: initial state plus input tx. 
Top-nicked double strands. 

Double- stranded DNA (dsDNA) can have interruptions (nicks) on one strand while remaining connected 
if the opposite strand has enough hold on the area around the nick. We called such structures nicked 
double-stranded DNA (ndsDNA). This excludes any long overhangs or any protrusions from the double- 
strand. In particular, we work with top-nicked double-strands, where all the nicks are on one strand 
(the top one by convention). A deviation from this simple structure happens fleetingly during branch 
migration, but all the initial and final species we use are ndsDNA. 

We use / for short domains, x,y,z for long domains, and a,b,c for long domains that are meant to 
be privately used by some construction. We write, e.g., tx for a single- stranded DNA (ssDNA) strand 
consisting of a toehold t followed by a domain x, and similarly for xt. We write, e.g., txy for a fully 
complemented double strand consisting of a continuous strand txy at the top and its Watson-Crick com- 
plement at the bottom. Finally, we write tx^y to indicate the same double strand but with a nick at the top 
between x and y. In the figures, a nick is indicated by an arrowhead and a discontinuity. 

Examples of allowable single and double strands are shown in Figure |2] We assume that domains 
indicated by different letters are distinct, so that, e.g., x does not hybridize with y, zy, yz, ty, or yt. To 
simplify our notation, we use an implicit equivalence illustrated in the bottom part of the figure. Suppose 
we start with a regular double strand, and we nick it at the top (bottom left). Long segments between 
nicks remain attached to the bottom strand, while short toehold segments can detach and reattach (bottom 
right). We regard these reversible states as equivalent; the notation x'^t'^y then indicates two equivalent 
situations, where the top t is either present or absent, and where / is implicitly exchanged with the 
environment. Hence, we can use x'^t'^y to indicate an open toehold between x and y, because the toehold 
is available (sometime). This way, we do not need to use separate notations for temporarily occluded 
and temporarily open toeholds, which we would have to regard as equivalent anyway (up to some kinetic 
occlusion effects). 

Two-domain strand displacement gates. 

All our gates are top-nicked dsDNA and our signals are two-domain ssDNA. This simple setup is more 
expressive than it might appear at first. For example (Figure [3]), let us consider a single strands tx as 
encoding a signal, with the strand xt as its cosignal, and consider the problem of constructing a sig- 
nal transducer T^^ from a signal tx to a signal ty, with the reduction T^\tx ^ ty, where | is parallel 
composition of components, and final waste is discarded. All signals share the same toehold t, and 
are distinguished by the long domains x,y,z, etc. As shown in Figure |4j the input tx can initiate a sig- 
nal/cosignal cascade of strand displacements in the left double-strand that after two toehold exchanges 
releases a private cosignal at (the segment a is privately used by the T^y transducer, with a distinct a for 
each xy pair). The at cosignal then initiates a backward cascade in the right double strand that releases 
the desired output signal ty at the fourth reaction. The release of ty is reversible, but the gate is then 
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Figure 4: Transducer T^cy \tx^ ty reactions 
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Figure 5: Fork Fjcyz \ tx ^ ty \ tz: initial state plus input tx. 



locked down by the last two reactions. The locking down of the gate is also used to reabsorb the xt and 
ta strands, by exploiting the x end of the right structure and the a end of the left structure. In the end, 
only unreactive (no exposed toeholds) dsDNA and ssDNA is left. In Figure |4} the initial structures from 
Figure [3] are shown inside rounded rectangles, and the final structures inside squared rectangles. The 



reaction rules are described abstractly in Figure 10 



The structures in Figure [3j can be written in the notation described above as Try rxfafa \ ta \ 
x'^tyha'^t I yt. The auxiliary signal ta contains the private segment a, uniquely joining the two halves 
of 7iy transducers, and we can therefore assume that it will not interfere with other gates. The auxiliary 
cosignal yt however contains a public segment y, which is necessary to release the output signal. It is 
therefore important to maintain an invariant that no other gate in the whole system spontaneously absorbs 
yt, or in general any public cosignal, although it may do so in a proper response to inputs. For example, 
a Tzy transducer and a 7^^ transducer may use "each other's" yt cosignal without problem. 

The transducer T^cy can be extended easily to a fork gate Fj^yz such that Fjcyz \tx^ty \ tz, releasing two 
outputs from one input. This is shown in Figure [5j where the left half of the structure is the same as in 
Tjcy. The fork gate can be extended to a catalytic gate Cxyz such that Cxyz \ tx \ ty ^ ty \ tz (Figure [6]). The 
right half of Cxyz is unchanged from F^yz, except that yt is not required because it is produced by the left 
half. This gate, like the more general join gate discussed next, takes two inputs, but absorbs them only if 
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Figure 6: Catalyst C^yz \ tx \ ty ^ ty \ tz: initial state plus inputs tx and ty. 



Luca Cardelli 



51 



b t 

~ z t 



xtbtztat 



t b y t 



Figure 7: Join J^cyz \tx\ty^ tz: initial state plus inputs tx, ty. 
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Figure 8: Join Jj^^ \tx\ty ^ tz'. final state plus output tz. 

both inputs are present f9l. If only the first input is present, it is returned to the soup by reversibility of 
strand displacement between tx and xt. 

Let us now consider, in Figures [t] and [Sj a binary join gate J^yz such that J^yz \ tx \ ty ^ tz (the 
generalization to additional outputs works as in the fork gate). Each distinct combination of xyz requires 
choosing a distinct private domain connecting the two halves of the gate; this private domain can however 
be shared among a population of gates with the same input and output signals. The main new feature 
in this gate is the additional t^by^t structure that absorbs a signal and a cosignal together, or neither 
separately. Without it, and without the bt, tb components, the join gate would leave behind a yt residual 
(all the other single strands, xt, zt, ta, are reclaimed). Hence t^by^t is a 'garbage collector' turning 
undesired active residuals to waste. It is triggered only after the release of a private strand tb, so that 
the collector does not reclaim an extraneous cosignal yt before the join gate has conmiitted to its inputs. 
Such an extraneous yi could come from a transducer Tjcy, or from another join J^vy (before any input) or 
Jyuv (after the first input) causing cross-gate interference, or even from within the same join, as in J^y. 
Removing garbage is important because accumulated garbage slows down future reactions by imposing a 
growing reverse pressure on the desired direction of the reactions. We have designed all gates to remove 
all active garbage, but until now garbage removal did not require additional double strands. The Join 
structure is easily generalized to any number of inputs; for example. Figure [9] shows a 3 -input Join with 
collectors. 



Discussion: The double strand restrictions. 

The restriction of allowing only ndsDNA structures has a number of potential advantages. The ab- 
sence of any branching seems inherently more trouble-free than complex structures that can interact 
in unexpected ways through their protruding single- stranded parts. Here all double- stranded structures 
are quiescent (except for receptive toeholds on the bottom strand) and only single- stranded components 
have hybridization potential, eliminating the possibility that the gate themselves may polymerize, or may 
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Figure 9: 3 -Join J^xyz \ tw \ tx \ ty ^ tz: initial state plus inputs tw, tx, ty. 



self-interact. These structures also have a simple syntactical representation and simple reduction rules, 
which simplify formal verification. Nothing prevents us from devising precise syntax and reductions for 
more general structures El, and there is no good reason in principle to avoid more complex structures if 
they work well. However, we have shown that our simplified structures already cover a surprising range 
of computation (fork and join gates in populations are equivalent to Petri Nets [1]), and hence one can 
restrict the use of more complex structures to the situations where they are actually needed, or where 
they somehow perform better. 



Discussion: The single strand restrictions. 

Our hybridized structures start as ndsDNA, but we have to ensure that they remain ndsDNA through 
computation. (Except for transients, i.e., during branch migrations that either revert harmlessly or lead to 
strand displacements.) This invariant puts constraints on the allowable single strands. First of all, single 
strands consisting only of long segments are inert because all the double strands are fully complemented 
(except for toeholds), and hence they can be ignored. A single strand of the form xty could bind to a 
double strand of the form xVz, leading to a configuration that is stable and is not ndsDNA. Therefore 
our single strands cannot contain substrands of the form xty, and we are left with single strands of the 
form, j^t^ or t^x^ or t^jd^t^. The third class could lead to stable configurations with two overlapping 
competing toeholds (t^x'^t^y^t with txt and tyt) and hence are ruled out too. Multiple toeholds in sequence 
bind as stably as a long domain, so e.g. xttt would be as bad as the former xty, and they can lead to 
competing toeholds: xUU^y with xtt and tty. Hence we do not allow consecutive toeholds in the top 
strands. Similarly, strands with consecutive long segments can lead to stable competition: txy and yzt 
over t^xyz^t. In the end, we are left only with xt or tx, and the only remaining competition is between 
tx and xt over t_hch, where the stable structures are ndsDNA. A final case to consider is tx and yt over 
t^xy^t: if a single strand is present it binds only reversibly, and if both are present they both bind stably 
and release xy, so the stable structures are always ndsDNA. In fact, t'^xy'^t is an important configuration 
that seems to add some power: without it we can still implement garbage-collecting join gates, but 
apparently only by using more than one distinct toehold. 



Discussion: The double strand restrictions, revisited. 

We finally have to make sure that no reactive single strands other than t, tx, xt, plus the unreactive x and 
xy, are ever released from double strands during computation. This imposes another restriction on double 
strands: nicks should break the top strand into segments of two domains or less. Otherwise, the double 
strand t^xtyU could release a forbidden single strand xty in presence of tx and yt. (We could still allow 
t'^xyz^t, but it would be unreactive.) Hence, we are left with allowable double strands that are nicked 
concatenations of the double- stranded elements x, tjc, xl, xy. 
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Figure 10: The basic reactions (D1,D2 are arbitrary or empty double strands). 

3 Nick Algebra 

In this section we provided a formal framework where we can perform calculations about the evolution 
of systems of top-nicked double strands. Domains are taken either from a finite set of short domains 
(toeholds) or from an unbounded set of long domains ranged over by x,y,z and a,b,c. The set of toeholds 
must be finite (and in practice quite small) because of its reversible-binding assumption that limits length 
and hence cardinality. Designs based on a single toehold can be easily adapted to multiple toeholds to 
increase binding discrimination and efficiency, but the converse is problematic: designs based on distinct 
toeholds may fail if the toeholds are then identified. Here we require only a single distinguished toehold, 
always indicated by the constant t, but it would be easy to generalize to multiple toeholds. 

An infix operator may be used to concatenate domains into single strands; this is often omitted, 
particularly because all our single- strands have the form t.x or x.t, which are then usually written tx 
and xt (unless we wish to use long identifiers for domains). Single strands t, x, and x.y remain implicit 
'waste', and are not used in the syntax. 

Double strands are written underlined. We use an infix operator '1' to represent a 'nick' on the 
top strand of a double- stranded sequence, an infix operator '^' (often omitted) to represent the unbroken 
concatenation of top and bottom strands, and ^ for the empty double strand. The segments between nicks 
are only single or pair combinations of toeholds and domains. 

A soup i7 is a finite multiset of single and double strands, with multiset union indicated by ' | ', and 
with a notation {vx)U for domain isolation. The latter indicates that x is not used outside of U\ this 
allows us to declare private domains locally, and to combine constructions compositionally. In practice, 
it means simply that all the domains indicated by V must be chosen distinct when a global system is fixed 
for execution: the algebraic laws for (vx)U encode such a guarantee. We also use as an abbreviation 
for n copies of U in parallel ( | ). The resulting algebra is our nick algebra, which is strictly a subset of 
the DSD (DNA Strand Displacement) language Q. 



Definition: Term Syntax 

S t.x\ x.t Single strand 

D::=^\l\x\t_jc\jU\xjc\ D^D Double strand 

U ::=S\D\U\U\{vx)U Soup 

The set of public domains pd{U) is the inductively defined set of those domains not bound by v 
in (7; in particular pd{t.x) - pd{x.t) = pd{x) = pd(t.x) = pd(x.t) = {x}, pd{x.y) = pd{t_) = 

pd{^) = {}, and pd{{vx)U) = pd{U) — {x}. Then, U{y/x} is the substitution of 3; for x in U, with 
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the representative cases t{y/x} = x{y/x} = y, z{y/x} = z for z 7^ x, {{vz)U){y/x} = {vz)U{y/x} for 
z ^ {x,};}, ((vx)C/){3;A} = (vx)C/, and ((vy) [/){}; A} = (vz)C/{z/3;}{3;A} for a z ^ pd{U) U {^,3;}. 

Algebraic equality (a binary congruence relation over the term syntax) is indicated just by = and 
is axiomatized below with the monoid laws of the commutative monoid laws of | ), and the 

scoping laws of {vx)U [61. 



Definition: Algebraic Equality 

= is an equivalence relation 



Ul = C/2, C/3 = C/4 ^ Ui I C/3 = C/2 I C/4 
C/i = C/2 ^ (vx)C/i = (vx)C/2 
Dit(D2^D3) = (Dit/)^)tD3 

Ul I (C/2 I U3) = (C/i I c/2) I C/3 
C/i I c/2 = c/2 I C/i 

^ I [/ = [/ 1 ^ = 

{vx)U = {vy){U{y/x}) ify^pd{U) 
(vx)^ = ^ 

(vx)(C/i I C/2) = C/i I (vx)C/2 if X ^ pJ(C/i) 
(vx)(V3;)C/ = (V3;)(vx)C/ 

Note that {vx){vx)U = {vx)U is derivable. As an example of use of the isolation operation, consider 
that it is always possible to bring all the v prefixes to the top level by making all the private domains 
distinct: {vx)tx \ {vx)tx= {vx)tx \ {vy)ty={vx){vy){tx \ /y). This means that conflicts between local 
definitions can be resolved globally, while allowing local definition to be combined without consideration 
of global conflicts. 

The reduction relation C/i C/2 describes a single step of system evolution; it is the smallest binary 
relation on U satisfying the rules below, where ^ stands for two reduction rules in opposite directions. Its 
synmietric and transitive closure C/i ^* C/2 describes multi-step system evolution. In the reduction rules, 
the single-stranded waste (t, x, xy) is automatically removed because it can be immediately identified as 
waste (as a consequence, the single strands t, x, xy need not be included in the syntax). Alternatively, 
we could have made the single- stranded waste explicit and introduced separate rules to remove it. The 
double- stranded waste instead has a special degradation rule because it requires a check over the whole 



double strand. The four basic reactions (exchange, coverage, cooperation) are depicted in Figure 10 



Definition: Reduction 

Dih^xt^D2 I tx <^ Dihxh^D2 I xt 
DiUh^D2 I tx Dihx^D2 
DihU^D2 I xt Diht^D2 
DiU^xyU^D2 \tx\yt^ DiUx^yt^D2 

if D not reactive 
Ui^U2 ^ C/i I C/ ^ C/2 I C/ 
C/i ^ C/2 ^ (vx)C/i ^ (vx)C/2 

C/l=C/2,C/2^C/3,C/3 = C/4 ^ C/i^C/4 



Exchange 
Left coverage 
Right coverage 
Cooperation 

Waste 
Dilution 
Isolation 
Well-mixing 
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A double strand D is reactive if it can react in some context; that is, by the first four rules. Hence 
it must be of the form Dih'^xt'^D2 , Di^tx^t^D2 , Di^th^D2 , Dih^t^D2 , or Di^thy^t^D2 . Among the 
unreactive (waste) double strands are thus t, x, xl, tx, xy, tjjx, xtU , xtUy, xt'^tUy, etc. The waste rule 
is really a convenience to simplify results of calculations; more generally, as commonly done in process 
algebra, one would instead eliminate unreactive components via an observational equivalence |[6l. 



4 Correctness 

If Ui ^* U2 then Ui may reduce to U2, but it may also reduce to something else since ^* is a relation. 
When U\ ^* U2 is used to state a correctness property of system reduction, we say that this is a may- 
correctness property: the system starting from Ui may reduce to U2, but it may also wander in a different 
section of state space and never be able to get to U2 from there. A stronger property is will-correctness, 
indicated by Ui -^^ U2, and defined as V(7, U\ U (7 ^* U2. This means that although Ui may 
wander to some U in some part of the state space, it will always find a path to U2 from there (it cannot 
avoid finding a path to U2). If Ui -^^ U2 and U2 is the only terminal state, then we can say that Ui must 
reduce to U2. But will-correctness does not imply that reduction necessarily terminates, and in particular 
if U -^^ U we can say that U is reversible. Since Ui ^* Ui holds by reflexivity, will-correctness implies 
may-correctness. (All these properties are really examples of a large class of reachability properties that 
could be expressed in a temporal logic.) 

It is convenient in the next examples and proofs to use a more pictographic notation for nick algebra 
expressions, to highlight the positions of the toeholds. We use the following abbreviations (1 is still 
needed in for x^y): 



Definition: Two-Domain Pictograms 



rX 


for tx 


Signal 




for xt 


Cosignal 


DrX 


for D^tx (including <j)) 


Bound signal 


XnD 


for xt'^D (including D — ^) 


Bound cosignal 




for dU^D' (including D = ^ or D' = ^) 


Bottom toehold 



For example, the transducer from Figure |3] can be written as: 

t'^xt'^at'^a I ta \ x'^ty'^ta'^t \ yt explicit notation 

^x^a^a \ra \ xryra^ \ y^ pictogram notation 



We now show that the transducer may work correctly. Because of their chemical origin, all com- 
ponents come in populations of identical molecules, and any private domain can only be private to a 
population, and not to an individual molecule. Hence we need to show that a populations of transducers, 
all sharing the same private domain, may map an input population to a desired output population. 



Proposition 1: Transducer May- Correctness 



Let T^y — {va){{.^x^a^a \ ra \ xryra^ \ y^Y), 
then \ rx" ^* r/. 
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Proof 

Let Txay = ^x^a^a \ ra \ xryra^ \ for a^x^y, so that = {va){{TxayY)' We first show that Txay \ ^x ->* 

T xay I ^X 

— ^x^a^a I ra \ xryra^ \ y^\rx 
^ rx^a^a I ra \ xryra^ \ y^\x-^ 
O rxra^a I xryra^ \ y^\x^\a-\ 
O rxra^a I xry^a-^ \ y^\x^\ra 

rXrara \ Xr^^^n | | 
Xry^a-^ I I Xn 

^ x^y^a-^ \x^\ry 

Xnjnan | ry 

^ry 

Hence {Tj^ay \ rxf ^* r/ by induction, {TjcayT \ rx" ^* r/ by associativity, (V(2)((r;,«3;)" | rx"") 
^* (va)r3;^ by isolation, and \ rx^ ^* ry^ by v-equivalence and by definition. End proof. 
We can similarly check the may-correctness of fork and join gates: 

Proposition 2: Fork May- Correctness 

Let F^y^ — {va){{^xnana \ ra \ xrzryra^ \ z-^ \ y^Y)^ 
then F^^ \ rx" ^* r/ | rZ^ 



Proposition 3: Join May-Correctness 

Let/^^ — {va){vb){{^x-^y^a^a \ ra \ xrbrzra^ \ \ z-^ \ ^b^y^Y), 
then J^y^ I rx"" I ^* rz^ 

Consider now the difficulties involved in proving more interesting properties. We would like a trans- 
ducer, for example, to work correctly in 'all possible contexts' . Unfortunately that is just not true, because 
some context could absorb the y^ strand, which is public, and interfere with the transducer. One would 
have to consider instead 'all possible contexts that do not interfere with yn\ This is a rather awkward 
notion: for compositionality one would have, for each component, to keep track of all the elements in 
the context that the component might be interfering with. Moreover, the transducer interferes with yn, 
and hence it interferes with (another copy or another population of) itself. 

Let us consider a simpler 'progress' property: that the transducer does not deadlock with itself. This 
can be expressed as a will-correctness property, that for any intermediate state [/, if T^y \ tx^ ^* U then 
U ^* ty^. This appears to require an induction on all possible intermediate configurations U for any 
n. Even for a fixed small n, the state space U can grow very large, which suggests that automated 
state exploration tools should be useful. Note also that an induction on the length of ^* is problematic 
because of the reversible exchange rule: infinite sequences of reductions exist in almost all systems. In 
a stochastic interpretation of reduction, actual convergence can often be achieved (with measure 1), and 
this is another challenging property to prove. 

We now illustrate how to check a will-correctness property, for a single copy of a transducer: 
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Proposition 4: Will-Correctness 



Tjy I rx -^^ ry. Moreover, ry is the only reachable terminal state. 



Proof 

We show that if | rx ^* U then U — ry. We enumerate all distinct states U, up to algebraic equality, 
arising from T^y \ tx by all possible traces, and then we check that each state can lead to ry. Assume 
x^y\ indentation means a branch in the derivation: 

01. {yd) ^x^a^a \ ra \ xryra^ \y^\rx 

02. {vd) rx^a^a \ ra \ xryra^ \y^ \ x^ 

03. ^ (va) rxra^a \ xryra^ \ y^\x^\a'^ 

04. ^ {va) rxra^a \ xry^a-^ \ y^\x^\ra 

05. ^ {yd) rXrara I Xry^a-^ I I 

06. {vd) xry^a^ I _yn I Xn 

07. ^ {va) x^y^a^ \x^\ry 

08. ^ {vdj x-^y^a-^ \ ry 

09. ry 

10. ^ {vd) rxrara I x^y^a^ \x^ \ry -^07 

11. ^ (va) rxrara \ x^y^a^ \ ry -^08 

12. ^ (V<2) rXrgrg \ ry ^09 

13. O (va) rxra^a I x^y^a^ \ X^ \ ra \ ry o 10 

14. ^ {va) rxra^a \ x^y^ai \ ra\ ry ^11 

15. ^ (va) rxra^a \ ra\ ry ^12 

All other states (up to algebraic equality) can be reduced to these states by well-mixing. We can then 
check that all these states have a path to state 9. The case for x = j is similar: the state graphs is the same 
because, as can be seen above, there is never both an x redex and a different y redex in the same state, and 
when two x signals or cosignals can be chosen, it does not matter which one is chosen, by well-mixing. 
End proof. 

For transducer composition, the may-correctness property T^\Ty^\ rx^ ^* rz^ follows simply from 
Proposition 1, but evenjust the will-correctness property 7^^ | 7^^ | rx (including x = z and y = z 
and X = y = z) does not follow from Proposition 4, and requires the analysis of a product state space. 
For example, tJ^ \ jj^ can absorb the inputs rx \ ry sequentially (converting rx to a second ry and then 
ry to rx) or in parallel (each transducer starting to process an input before producing an output). In fact, 
consider the following transducer that uses a public 'a' domain instead of a private one: 

T^ay = ^x^a^a I ra I xryra^ I 

Txay by itself satisfies may and will-correctness as shown above for 7^, and so does Tyax- But the 
two together do not satisfy the will-correctness property of just producing rx on input rx, because the 
following 'crosstalk' derivation is possible, where in the third step a-^ goes to the 'wrong' gate: 

Txay I TyQx I ^X 

— ^x^a^a I ra \ xryra^ \ y^ \ ^y-^a^a \ ra \ yrxra^ \ x^ \ rx 



^ rx^a^a I ra \ xryra^ \ y^ \ ^y^a^a \ ra \ yrxra^ | xn | xn 
4^ rxra^a I xryra^ I _yn I ^yna^a \ ra \ yrxra^ | Xn | Xn | an 
O rxra^a I xrjra^ | | ^y^a^a I ra I jrx^an I Xn I xn I ra 
rxrara \ Xryra^ \ | ^y^a^a I ra I jrx^an I Xn I Xn 
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Xryra^ I | ^J^U^U \ rU \ yrX^a^ I Xn I Xn 

^ xryra^ I | ^j^u^u \ ru \ y^x^a^ \ X^ \ rX 
-^xryra^ I ^y^a^a \ ra \ y^x^a^ \ x^ \ rx 
xryra^ I ^y^a^a \ ra\x^\rx 



The last state is final (no further progress can be made), and is not just the expected rx (which can 
be obtained by a different derivation). Moreover, no ry is ever produced. The system is deadlocked 
in a state where the output rx has been produced, but many other active components have been left to 
interfere with future operation. However, that last state, if supplied with an additional ry, then unblocks 
and reduces just to ry \ rx. Hence, although T^^ay \ Tyax \ we have that T^ay \ Tyax \ 

rx I ry. That means that a large population of such gates in practice does not deadlock easily over an 
input population of rx: each pair of stuck gates can be unblocked by another gate correctly producing 
a ry, and it is very unlikely that a large fraction of gates ends up being blocked. This can be seen in 
stochastic simulations of large populations, and also in Ordinary Differential Equation simulations with 
unit concentration of Tjcay \ Tyax, where the concentration of the residual ra tends asymptotically to zero. 
Hence, another interesting property of these system is that, even though small populations may deadlock, 
large populations may converge to an almost-correct solution with high probability. 



5 Testing 

Gate and circuits designs have been tested with the DSD tool |7 |. We give a simple example here, testing 
a combination of two fork and four join gates in the following configuration, where yv, yw, zv, zw are 
four output domains (i.e., yv does not mean y.v in this section). 




Since fork and join gates accept inputs and produce outputs in a specific order, one should not expect 
identical rates of production of yv,yw,zv,zw. (If desired, one can mix populations of symmetric gates, to 
achieve symmetric behavior.) In Figure 1 1 we see an Ordinary Differential Equations simulation with 
unit rates for toehold binding and unbinding, and with concentrations of 1.0 for the input signals and 
10.0 for the gates; hence 10% of each gates is consumed during the computation. The system has a total 
of 54 single strand species, 108 double strand species, and 172 reactions, and therefore 162 ODEs. At 
time 3 (left), yv is ahead out of the gates, with zw trailing last. At time 30 (middle left) yv and yw are 
closer, and zv and zw are closer. At time 300 (middle right) the computation has reached 90% completion 
with similar output quantities approaching the expected 0.5 concentration. The higher curve of the fourth 
graph shows the total accumulation of the four rD^D'^ garbage species for the join gates, indicating that 
all the gates are being converted to waste. One can further examine the trajectories of all the species in 
the system to check that no deadlock occurs, and that all the structures are turned to output or to waste. 
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Figure 11: Testing a fork/join circuit. 



6 Conclusions 

We have shown how to implement fork and join gates via simple two-domain structures, and how to 
implement them in a 'clean' way that automatically removes all active garbage. In essence, we have 
given an implementation of the higher-level strand algebra of [IJ. But is this implementation correct? 
We have provided a formal framework where we can perform calculations and study such questions, 
and we have discussed some simple correctness definitions and some complex behavioral properties. A 
formal proof of absence of gate interference under all possible combinations and numbers of gates and 
inputs will require an extensive amount of case analysis, which likely needs to be automated, as well 
as the identification of appropriate invariants. Alternatively, one may gain confidence in the designs by 
simulation testing. 

Acknowledgments 

Figures were prepared with the DSD tool [7|. I would like to thank the members of the Molecular 
Programming Project at Caltech and U.Washington for many tutorials and discussions. 

References 

[1] L. Cardelli. Strand Algebras for DNA Computing. In DNA Computing and Molecular Programming. LNCS 
5877, Springer, October 2009, pp 12-24. 

[2] Y. Benenson, T. Paz-Elizur, R. Adar, E. Keinan, Z. Livneh, E. Shapiro. Programmable and Autonomous 
Computing Machine made of Biomolecules. Nature, 414(22), November 2001. 

[3] W. Fontana. Pulling Strings. Science 314(8), 2006. 

[4] S. J. Green, D. Lubrich, A. J. Turberfield. DNA Hairpins: Fuel for Autonomous DNA Devices. Biophysical 
Journal 91, October 2006, 2966-2975. 

[5] M. Hagiya. Towards Molecular Programming. In G. Ciobanu, G. Rozenberg, (Eds.) Modelling in Molecular 
Biology. Springer, 2004. 

[6] R. Milner. Communicating and Mobile Systems: The TT-Calculus. Cambridge University Press, 1999. 

[7] A. Phillips, L. Cardelli. A Programming Language for Composable DNA Circuits. Journal of the Royal 
Society Interface, August 2009 6:S419-S436. 

[8] G. Seelig, D. Soloveichik, D.Y. Zhang, E. Winfree. Enzyme-Free Nucleic Acid Logic Circuits. Science 
314(8), 2006. 

[9] D. Soloveichik, G. Seelig, E. Winfree. DNA as a Universal Substrate for Chemical Kinetics. PNAS 107 no. 
12, 5393-5398. 



60 



Two-Domain DNA Strand Displacement 



[10] B. Yurke, A.R Mills Jr. Using DNA to Power Nanostructures. Genetic Programming and Evolvable Machines 
archive 4(2), 111 - 122, Kluwer, 2003. 

[11] D. Y. Zhang, A. J. Turberfield, B. Yurke, E. Winfree. Engineering Entropy-driven Reactions and Networks 
Catalyzed by DNA. Science, 318:1121-1125, 2007. 

7 Appendix 

7.1 IVIay-Correctness of binary Fork and Join gates 
Proposition 2: F^^ IVIay- Correctness 

Let F^^ — {y d)(Jy■^x-^a-^a \ ra \ xrzryra^ \ Z"" | y^Y), 

then F^^ \ rj^ ^* r/ | rf. 

Proof 

Let Fxayz = ^x-^a^a \ ra \ xrz^yra^ I I for a ^ x^y^z, so that/^^^ = {va){{FxayzY)' We first show 

\hdiiFxayz k-^-^* I 
^ xayz I ^-^ 

— ^X^a-\a I ra \ XrZryra.^ | Zn | | Xn 

^ rx^a^a I ra \ xrzryra^ I I I Xn 
<-> rxra^a I xrzryra^ \ z-^ \ y^ \ x^ \ 
O rxra^a I xrzry^an \ z^ \y^ \ x-\ \ ra 
rxrara \ Xrzry^a^ \ z^ \y-^ \x-\ 

XrZry^a^ I Z^ I I 

<-> xrz^y^a^ \ z^ \x^ \ ry 

O X-^Z-^y^a-^ I I rj I rZ 

x^z-^y^a^ \^y 

\ry\rZ 

Hence {F^ayz I ^^T ^* {^y I ^z)" by induction, {F^ayzT I ^•^'^ ^ I by associativity, (va)((F;,^_y^)" 
->* (va)(r3;'^ I rz'^) by isolation, and F^y^ \ rx^ ^* ry^ j rz'^ by v-equivalence and by F^^ definition. End 
proof. 

Proposition 3: J^y^ IMay-Correctness 

Let/^^ {va){vb){{^x^y-^a-^a \ ra \ xrbrzra^ \b^ \ z^ \ ^b^y^Y), 
then J^^ I rx"" I r/ ->* rz'^. 

Proof 

Let /^^j^^ — ^x^y^a^a \ ra \ xrbrzra^ \ b-^ \ z^ \ ^b^y^ for a ^ x^y^z, so that = {yci){{JxyazT)' 
first show that J^yaz \^^\^y ^* 

J xyaz \ ^-^ \ ^y 

= ^Xnjnana | ra \ XrbrZra^ \ b^ \ Z^ \ ^b^y^ \rx\ ry 

^ rx^y-\ana \ ra \ xrbrzra^ I /?n I z^ I ^b^y^ \ ry \ x^ 

^ rXry^a^a \ ra \ XrbrZra^ I Z7n I Z-i I ^b^y^ I I 
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^ rXryra^a I XrbrZra^ | Z^n | | ^b^J^ | Xn | | 





xrbrz^a^ \ b^ \ . 


z^ ^b^y^ 




xrbrz^a^ \ b^ \ z^ \ ^b^y^ \ x^ \y^ 


^ xrbrz^a^ 


\ bn \ z^ \ ^b^y^ 


\xn\yn 

\y^\rz 

\ rz\ rb 
rb 


^ xrb^z^a^ 


bn ^b^y^ 


^ x^b^z^a^ 


1 ^b^y^ \^^\y^ 


-^x^b^z^a^ 


^b^y^ 1 1 rZ 1 



^b^y^ \y^ \ rz\ rb 



rZ 

Hence {Jxyaz I I ^yT ^* ^^"^ by induction, {JxyazT I ^^"^ I ^3^"^ ^* ^^"^ by associativity, {va){{JxyazY I ^-^"^ I ^3^"^) ^* 
(va)rz^ by isolation, and | rx^ | ry^ ^* by v-equivalence and by definition. End proof. 



7.2 DSD Script for Figure [IT] 

This script can be run from a browser in DSD fP\ using 'deterministic' simulation. 
http://research. microsoft, com/en-us/projects/dna/default. aspx 

directive sample 300.0 1000 

directive plot <t" yv>; <t" yw>; <t" zv>; <t" zw>; sum([t" _] : [_ t"] ) 
new tOl. 0,1.0 

def F(N, X, y, z) = 
new a 

( N* <t^ a> 
I N* <y t^> 
I N* <z t^> 

I N* t^: [x t^] : [a t^] : [a] 

I N* [x] : [t- z] : [t^ y] : [t^ a] rt^ ) 



def J(N, X, y, z) = 
new a new b 
( N* <t^ a> 
I N* <b t^> 
I N* <z t^> 

I N* t^: [x t^] : [y t^] : [a t^] : [a] 
I N* [x] : [t^ b] : [t^ z] : [t^ a] :t^ 
I N* t^: [b y] :t^ ) 



( F(10, X, y, z) 

I F(10, u, V, w) 

I J(10, y, V, yv) 

I J(10, y, w, yw) 

I J(10, z, V, zv) 

I J(10, z, w, zw) 

I 1 * <t^ x> 



I 1 * <t^ u> ) 



